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Abstract 

Tries are among the most versatile and widely used data 
structures on words. They are pertinent to the (internal) 
structure of (stored) words and several splitting procedures 
used in diverse contexts ranging from document taxonomy 
to IP addresses lookup, from data compression (i.e., Lempel- 
Ziv'77 scheme) to dynamic hashing, from partial-match 
queries to speech recognition, from leader election algorithms 
to distributed hashing tables and graph compression. While 
the performance of tries under a realistic probabilistic model 
is of significant importance, its analysis, even for simplest 
memoryless sources, has proved difficult. Rigorous findings 
about inherently complex parameters were rarely analyzed 
(with a few notable exceptions) under more realistic models 
of string generations. In this paper we meet these challenges: 
By a novel use of the contraction method combined with 
analytic techniques we prove a central limit theorem for 
the external path length of a trie under a general Markov 
source. In particular, our results apply to the Lempel-Ziv'77 
code. We envision that the methods described here will 
have further applications to other trie parameters and data 
structures. 

1 Introduction 

We study the external path length of a trie built 
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over n binary strings generated by a Markov source. 
More precisely, we assume that the input is a sequence 
of n independent and identically distributed random 
strings, each being composed of an infinite sequence 
of symbols such that the next symbol depends on the 
previous one and this dependence is governed by a given 
transition matrix (i.e., Markov model). 

Digital trees, in particular, tries have been inten- 
sively studied for the last thirty years [21 El El EI El 
El HS1 HSl HH HOI EH I231 12H I2H1 123 HD], mostly under 
Bernoulli (memoryless) model assumption. The typical 
depth under Markovian model was analyzed in [TBI HO] • 
Size, external path length and height under more gen- 
eral dynamical sources were studied in the seminal paper 
of Clement, Flajolet, and Vallee [2], where in particular 
asymptotic expressions for expectations are identified 
as well as the asymptotic distributional behavior of the 
height, see also [3]. For further analysis of tries for prob- 
abilistic models beyond Bernoulli (memoryless) sources 
see Devroye [SI [7] . 

With respect to Markovian models, to the best 
of our knowledge, no asymptotic distributions for the 
external path length have been derived so far. It is 
well known [40] that the external path length is more 
challenging due to stronger dependency. In fact, this is 
already observed for tries under Bernoulli model [3D]. 
In this paper we establish the central limit theorem for 
the external path length in a trie built over a Markov 
model using a novel use of the contraction method. 

Let us first briefly review the contraction method. 
It was introduced in 1991 by Uwe Rosier [34j for the dis- 
tributional analysis of the complexity of the Quicksort 
algorithm. Over the last 20 years this approach, which 



is based on exploiting an underlying contracting map 
on a space of probability distributions, has been devel- 
oped as a fairly universal tool for the analysis of recur- 
sive algorithms and data structures. Here, randomness 
may come from a stochastic model for the input or from 
randomization within the algorithms itself (randomized 
algorithms) . General developments of this method were 
presented in 03 [3H [3H HH1 1301 [HI HOI HH EH with nu- 
merous applications in Theoretical Computer Science. 

The contraction method has been used in the anal- 
ysis of tries and other digital trees only under the sym- 
metric Bernoulli model (unbiased memoryless source) 
[29l Section 5.3.2], where limit laws for the size and the 
external path length of tries were re-derived. The ap- 
plication of the method there was heavily based on the 
fact that precise expansions of the expectations were 
available, in particular smoothness properties of peri- 
odic functions appearing in the linear terms as well as 
bounds on error terms which were 0(1) for the size and 
O(logn) for the path lengths. Let us observe that even 
in the asymmetric Bernoulli model such error terms 
seem to be out of reach for classical analytic methods; 
see the discussion in Flajolet, Roux, and Vallee |12) . 
Hence, for the more general Markov source model con- 
sidered in the present paper we develop a novel use of 
the contraction method. 

Furthermore, the contraction method applied to 
Markov sources hits another snag, namely, the Markov 
model is not preserved when decomposing the trie into 
its left and right subtree of the root. The initial 
distribution of the Markov source is changed when 
looking at these subtrees. To overcome these problems 
a couple of new ideas are used for setting up the 
contraction method: First of all, we will use a system of 
distributional recursive equations, one for each subtree. 
We then apply the contraction method to this system of 
recurrences capturing the subtree processes and prove 
normality for the path lengths conditioned on the initial 
distribution. In fact, our approach avoids dealing with 
multivariate recurrences and instead we reduce the 
whole analysis to a system of one-dimensional equations. 
A comparison of a multivariate approach and our new 
version with systems of recurrences is drawn in Section 

m 

We also need asymptotic expansions of the mean 
and the variance for applying the contraction method. 
However, in contrast to very precise information on 
periodicities of linear terms for the symmetric Bernoulli 
model mentioned above our convergence proof does only 
require the leading order term together with a Lipschitz 
continuity property for the error term. 

In this extended abstract we develop the use of 
systems of recursive distributional equations in the 



context of the contraction method for the external path 
length of tries under a general Markov source model. 
In particular, we prove the central limit theorem for the 
external path length, a result that had been wanting 
since Lempel-Ziv'77 code was devised in 1977. The 
methodology used is general enough to cover related 
quantities and structures as well. We are confident 
that our approach also applies with minor adjustments 
at least to the size of tries, the path lengths of digital 
search trees and PATRICIA tries under the Markov 
source model as well as other more complex data 
structures on words such as suffix trees. 

Notations: Throughout this paper we use the 
Bachmann-Landau symbols, in particular the big O no- 
tation. We declare xlogx := for x = 0, where log a; 
denotes the natural logarithm. By B(n,p) with n G N 
and p G [0, 1] the binomial distribution is denoted, by 
B(p) the Bernoulli distribution with success probability 
p, by Af(0, a 2 ) the centered normal distribution with 
variance a 2 > 0. We use C as a generic constant that 
may change from one occurrence to another. 

2 Tries and the Markov source model 

The Markov source: We assume binary data strings 
over the alphabet S = {0,1} generated by a homoge- 
neous Markov chain. In general, a homogeneous Markov 
chain is given by its initial distribution fi = pqSo + nidi 
on E and the transition matrix (py)i,je£' Here, 5 X de- 
notes the Dirac measure in a; € Ml. Hence, the initial 
state is with probability fiQ and 1 with probability 
Hi. We have Ho-i^i € [0,1] and /i + /ii = 1. A tran- 
sition from state i to j happens with probability Pij, 
i, j G E. Now, a data string is generated as the sequence 
of states visited by the Markov chain. In the Markov 
source model assumed subsequently all data strings are 
independent and identically distributed according to the 
given Markov chain. 

We always assume that p^ > for all i,j G E. 
Hence, the Markov chain is ergodic and has a stationary 
distribution, denoted by tt = ttqSq + niSi. We have 



y.-L "u — . i "j. — 

Poi+Pio Poi+Pio 

Note however, that our Markov source model does not 
require the Markov chain to start in its stationary 
distribution. 

The = 1/2 for all i,j G E is essentially 

the symmetric Bernoulli model (only the first bit may 
have a different (initial) distribution). The symmetric 
Bernoulli model has already been studied thoroughly 
also with respect to the external path length of tries, 
see [TU [231 HU- It behaves differently compared to 



the asymmetric Bernoulli model and the other Markov 
source models, as the variance of the external path 
length is linear with a periodic prefactor in the sym- 
metric Bernoulli model. In our cases we will find a 



larger variance of the order rologn in Theorem 5.1 be- 
low. We exclude the symmetric Bernoulli model case 
subsequently. For later reference, we summarize our 
conditions as: 



(2.2) 



Pij € (0, 1) for all i,j € S, 
Pij - ^ - for some (ij')eE 2 



The entropy rate of the Markov chain plays an impor- 
tant role in the asymptotic behavior of tries. In particu- 
lar, it determines leading order constants of parameters 
of tries that are related to depths of leaves and its ex- 
ternal path length. The entropy rate for our Markov 
chain is given by 



(2.3) H 



^ TTi Pij log Pij = ^2 n l H i 



where Hi := —J2jesPij^°SPij i s the entropy of a 
transition from state i to the next state. Thus, H 
is obtained as weighted average of the entropies of 
all possible transitions with weights according to the 
stationary distribution tt. 

Tries: For a given set of data strings over the alphabet 
E = {0, 1} with each data string a unique infinite 
path in the infinite complete rooted binary tree is 
associated by identifying left branches with bit and 
right branches with bit 1. Each string is stored in the 
unique node on its infinite path that is closest to the root 
and does not belong to any other data path, cf. Figure 1. 
It is the minimal prefix of a string that distinguishes this 
string from all others; for details see the monographs of 
Knuth [26], Mahmoud [27] or Szpankowski [TO] , 

3 Recursive Distributional Equations 

For the Markov source model a challenge is to set the 
right framework under which data structures to analyze. 
We formulate in this section a system of distributional 
recurrences to capture the distribution of the external 
path length of tries. Our subsequent analysis is entirely 
based on these equations. 

We denote by the external path length of a trie 
under the Markov source model with initial distribution 
H holding n data. We have Lq = L± = for all 
initial distributions fi. The transition matrix is given in 
advance and suppressed in the notation. We abbreviate 
L l n := L*> for i € S. Hence, L l n refers to n independent 
strings all starting with bit i and then following the 





Figure 1: The infinite rooted binary tree contains the 
infinite paths of six strings (left). The corresponding 
trie is obtained by cutting each path at the closest node 
to the root that does not belong to any other path. 

Markov chain. We will study L° and L\. From the 
asymptotic behavior of these two sequences we can 
then directly obtain corresponding results for for 
an arbitrary initial distribution /i = /io^o + Mi^i as 
follows: We denote by K n the number of data among 
our n strings which start with bit 0. Then K n has the 
binomial B(n, no) distribution. The contributions of the 
two subtrees of the trie to its external path length can 
be represented by the following stochastic recurrence 



(3.4) 



Li 



n-K n i 



n>2, 



where = denotes that left and right hand side have 
identical distributions and we have that (L°, . . .,L° n ), 
(Lq, . . . ,L\) and K n are independent. We will see later 
that we can directly transfer asymptotic results for LP n 
and L x n to general L% via (3.4), see, e.g., the proof of 
Theorem 16.11 

For a recursive decomposition of lP n note that we 
have initial distribution <5o, thus all data strings start 
with bit and are inserted into the left subtree of 
the root. We denote the root of this left subtree by 
w. At node w the data strings are split according 
to their second bit. We denote by /„ the number 
of data strings having as their second bit, i.e., the 
number of strings being inserted into the left subtree 
of w. The Markov source model implies that /„ is 
binomial £?(n,poo) distributed. The right subtree of 
node w then holds the remaining n — I n data strings. 
Consider the left subtree of w together with its root w. 
Conditioned on its number /„ of data strings inserted 
it is generated by the same Markov source model as the 
original trie. However, the right subtree of w together 
with its root w conditioned on its number n — I n of data 
strings is generated by a Markov source model with the 
same transition matrix but another initial distribution, 
namely 5\. Moreover, by the independence of data 



strings within the Markov source model, these two 
subtrees are independent conditionally on /„. Phrased 
in a recursive distributional equation we have 



(3.5) L°ii°+£i 



-1 71 



n > 2, 



with (Lq, . . . , L° ), (Lq, . . . , and I n independent . A 
similar arguments yields a recurrence for L\. Denoting 
by J n a binomial B(n,pn) distributed random variable, 
we have 



(3.6) Li^L^+L^ + n, 



n > 2, 



with (Lq,...,L°), (Lq,...,L„) and J„ independent 
Our asymptotic analysis of L£ is based on the distri- 
butional recurrence system (3.5)-(3.6) as well as (3.4 1. 



4 Analysis of the Mean 

First we study the asymptotic behavior of the expec- 
tation of the external path length with a precise error 
term needed to derive a limit law in Section |6] The lead- 
ing order term in Theorem |4.1| below has already been 
derived (even for more general models) in Clement, Fla- 
jolet and Vallee [2J. 

Theorem 4.1. For the external path length of a bi- 
nary trie under the Markov source model with conditions 



2.2) we have 



E [ L n\ = 77nlogn + 0(n), (n->oo), 
H 

with the entropy rate H of the Markov chain given in 



(2.3). The 0(n) error term is uniform in the initial 
distribution p. 



Our proof of Theorem |4.1| as well as the corresponding 
limit law in Theorem |6.1| dcpcnd on refined properties of 
the O(n) error term that are first obtained for the initial 
distributions /i = So and p, = Si and then generalized to 



arbitrary initial distribution via (3.4). For fi — Sq and 



/i = Si we denote this error term for all n € No and 
i e S by 

(4.7) /i(n):=E[Z4]-inlogn. 

The following Lipschitz continuity of fa and fi is crucial 
for our further analysis: 

Proposition 4.2. There exists a constant C > such 
that for both i g S and all m, n € No 

|/i(m)-/i(n)| < C\m-n\. 



The proof of Proposition 4.2 is based on a refined 
analysis of transfers from growth of toll functions in 
systems of recursive equations to the growth of the 
quantities itself. The heart of the proof of Proposition 



4.2 and hence Theorem 4.1 is the following transfer 
result. The proof is technical and provided in the full 
paper version of this extended abstract. 

Lemma 4.3. Let (di(n)) n >o and {ili( n ))n>o be real se- 
quences and (Ai > „) T1 >2 sequences of binomial B(n,pi) 
distributed random with pi £ (0,1) for i £ S. Assume 
that for constants co,ci,do,di £ (0, 1) with cq + di = 
c i + do = 1 we have for all n > 2 and i £ S 

(4.8) Oi(n) = c 4 E[ ai (A l: „)] + d l E[ai^{n - X i>n )] 

+ Vi( n )- 

If furthermore r}i(n) = 0(n~ a ) for an a > and both 
i £ E, then, as n — > oo, 

cn(n) = 0(1), % e s. 
5 Analysis of the Variance 

To formulate an asymptotic expansion of the variance of 
the external path length we denote by A(s) the largest 
eigenvalue of the matrix P(s) := (p(j S )i.jez- Note that 
A as a function of s is smooth. We denote its first and 
second derivative by A and A respectively. Then we 
have: 

Theorem 5.1. For the external path length of a bi- 
nary trie under the Markov source model with conditions 
(2.2) we have, as n — > oo, 

(5.9) Var(L^) = a 2 n log n + o(n log n), 

where a 2 > is independent of the initial distribution \i 
and given by 



(5.10) 



A(-l)-A 2 (-l) 



A3(-l) 

With Ho and Hi defined in (2. Sty we have 

Hi — H 



TTOPOOPOl 



H 



log 



Poo 
Poi 



Poi + PlO 



TTiPioPn A . / Pw\ Hi - H 
H 3 V VPii/ Poi+Pio 



We start with the analysis of the Poisson variance 
of the external path length, i.e. u«(A) := \ai{L l N ), 
i e S, where N\ has the Poisson(A) distribution and 
is independent of (L^)„> . In the second part we 
use depoissonization techniques of |19 to obtain the 
asymptotic behavior of Var(LJJ. 



The reason why we consider a Poisson number of 
strings is that for N x i.i.d. strings with initial distribu- 
tion Si the number N Xpi0 of strings whose second bit 
equals and the number M\ Pil of strings whose second 
bit equals 1 are independent and remain Poisson dis- 
tributed. Hence, in the Poisson case we obtain similarly 



to (3.51 and (3.6) that for i e E 



(5.11) D Nx ±L% Xpio +L\ 



M. 



X Pil 



+ ^Ap, + M Xp,i ~ 1 {JV^ i0 +i^x P41 =l} 

where (i°) n > , (Z*)„> , N Xpi0 and M Xpil are in- 
dependent, N Xpi0 has Poisson(Apio) distribution and 
M Xpil has Poisson (Xpn) distribution. Note that 



1{JVa p +Mx P . =1} 1S necessary in order that (5.11 1 holds 



when {N x = 1}. 

We denote by i>i(X) := E[L' l N ], i 6 E, the Poisson 
expectation of the external path length which is 



^ n! 

n=0 



E[l£] 



Note that (5.11) implies 



(5.12) Vi(\) = v (Xp i0 ) + ^(Apii) + A(l 



We need precise information about the mean (second 
order term) to derive the leading term of the variance. 
We shall use analytic techniques, namely the Mellin 
transform as surveyed in |40j that we discuss next. A 
Mellin transform /* (s) of a real function f(x) is defined 
as 

p oo 



f*(s) = / f(x)x a 
Jo 

Let v*(s) be the Mellin transform of Vi(X). Then, by 
known properties of the Mellin transform [5D] , the func- 



tional equation (5.12 1 becomes an algebraic equation for 
i e E 

"i 00 = r ( s + !) + Pm v o( s ) +PixV*{s)- 

Define the column vector v*(s) := (vq (s), v*(s)) and 
the column vector ■j(s) :— (T(s),T(s)). Then we 
can write the latter equations as the matrix equation 
v*(s) = j(s + 1) + P(s)u*(s) that we write as 



(5.13) 



u*(s) = (I-P(s))- 1 ' Y (s + l). 



Then the Mellin transform z^*(s) of the mean external 



path length E[L* r . 1 under the Poisson model satisfies 



(5.14) v*(s)=T(s + 1) + ij(s)u*(s) 

where fi(s) := , ^ s ). 



To recover the mean external path length under the 
Poisson model we need to apply the singularity analysis 



to (5.14). For matrix -P(s), we define the principal left 
eigenvector 7r(s), the principal right eigenvector ip(s) 
associated with the largest eigenvalue A(s) such that 
(■7r(s),if>(s)) — 1 where we write (x, y) for the inner 
product of vectors x and y. Then by the spectral 
representation [40] of P(s) we find 

= ?^ + o(l/(l -A(-))) 

that leads to the following asymptotic expansion around 
s = -1 



(5.15) 



-1 1 1/7 A(-l) 

A(-l)(.s + l) 2 s + l\A(-l) 2A(-1) 

1 / (A(-l)^(-l)) 



1 



A(-l) 



1 



0(1) 



where x(t) and x(i) denote the first and second deriva- 
tives of the vector x(i) at t. 



Using (5.15), inverse Mellin transform, and the 



residue theorem of Cauchy, as well as analytic depois- 
sonization of Jacquet and Szpankowski |19j we finally 
obtain 

(5.16) 

E\m = — nlogn + n f ^P— + | 
1 " J H S \X(-l) 2A(-1) J 

+ n + 1 + *G°g»)) + o(n) 

where <S>(x) is a periodic function of small amplitude 
under certain rationality condition (and zero otherwise) ; 
see [20] for details. 

The asymptotic analysis of the variance follows the 
same pattern, however, it is more involved. Our analysis 
of the Poisson variance Vi(X) = Va,r(L l N ) is based on 
the following decomposition: 



Lemma 5.2. For any A > and i € E we have 
(5.17) 

Vi(X) = v Q (Xp l0 ) + vi(Xpa) + 2Xp iQ i>' (Xpi ) 



2Xpiiv[(Xpii) + 2Ae x (i>o(Xp i0 ) + &x(Xpa)) 



A(l 



v ) + A 2 e- A (2-e- A ) 



where v[,i € E, denotes the derivative of vi, i.e. for 
z > 



71=1 



(n-1) 



The Mellin transform v*(s) of v,(A) is 

= PioXO) +PiTX( s ) - 2s P«)X( s ) 

-2 SPa v*( s )-r( s + i) + ^( s ) 

with F*(s) the Mellin transform of e~ x (i>' (\p iQ ) + 
2Xpni>' 1 (Xpii) + A 2 (2 — e~ A )). Thus, the column vector 
v*(s) := (uq (s), v\ (s)) satisfies the following algebraic 
equation 

v*(s) =P(s)v*(s) - 2sP{s - l)v*(s) 
-7(s + l) + F*(s) 

where F*(s) := (F *(s), F^s)). Then, as we did before 
for the mean analysis, we obtain 



v(s) 



2sT{s + 1)(tt(s), P{s - l)-0(s))V(s) 
+ 0(1/(1 -A(a)). 



After further computations we find that the Poisson 
variance 5(A) = Var(L^ ) is 



v(X) 



' -Alog 2 A + 



A 



A 2 (-l) 
+ 0(A) 



A(-l) 

2A 3 (-1) ^ A 2 (-l) 



A log A 



for some explicitly computable constant A. Finally, 
with depoissonization, cf. [40] . we obtain 



Var(i^) = v(n) - n[v'{n)] 2 
A(-l)-A 2 (-l) 



A3(-l) 



n log n + 0(n) 



proving Theorem |5.1 



6 Asymptotic Normality 

Our main result is the asymptotic normality of the 
external path length: 

Theorem 6.1. For the external path length of a bi- 
nary trie under the Markov source model with conditions 



2.2) we have 



(6.18) 



y/ n log n 



A/"(0,ct 2 ), (n->oo), 



where a 2 > is independent of the initial distribution /J, 



and given by (5.10). 



As in the analysis of the mean, we first derive limit 
laws for L?, and L\ and then transfer these to a limit law 



for L% via (3.4). We abbreviate for i G E and n £ No 
i/<(n) := E[LjJ, a t {n) := y/Wai{L\). 



Note that we have ViifS] — — <x;(0) = (7,(1) = 
and <7j(n) > for all n > 2. We define the standardized 
variables by 



(6.19) 



Y l := 



o-j(n) 

and Yg 1 := Y{ := 0. Then we have: 



PROPOSITION 6.2. For both sequences (Y£) n > , i G E, 
we have convergence in distribution: 



(6.20) 



Yl 



AT(0, 1) 



(n 



oo 



We now present a brief streamlined road map of the 
proof. 



Step 1. Normalization. From the system (3.5 )-( 3.6) 



where we denote there J„ := /„ and Zi := J„, and the 



normalization (6.19) we obtain for all n>2, 

o-i-i(n - In), 



(6.21) Y r 

where 
bi(n) = 



(Tj(n) 



Ui{n) 



-Y 



n-Il 



bi(n), 



CTj(n) 



(n + Vi{P n ) + vi-i{n - P n ) - Ui{n)) , 



and in fl6.21| we have that (Y °, . . . , Y„°), (Y \ . . . , Y^) 
and (/^, i^) are independent. It can be shown by 
our expansions of the means Vi{n) and the Lipschitz 
property from Proposition 4.2 that we have bi{n) — > 
as n — > oo for both i e E, e.g., in the L 3 -norm which 
below will be technically sufficient. Furthermore, the 



asymptotic of the variance from Theorem 5.1 implies 
together with the strong law of large numbers that the 



coefficients in (6.21) converge 

<Ti(I l n) 



<Ji{n) 



'Pi 



°l-i( n -In) 

ai(n) 



Pi 



where we recall that Ci(F n ) is the standard deviation of 
L l ji conditioned on F nl hence, in particular a random 
variable. 

Step 2. System of limit equations. The conver- 



gence of the coefficients in (6.21) suggests, by passing 
formally with n — > oo, that limits Y° and Y 1 of Y® and 
Y*, if they exist, should satisfy the system of recursive 
distributional equations 

(6.22) Y° = y/pooY + i/T^-pooY 1 , 

(6.23) Y 1 = ,/T^pTiY + y/mY\ 

where Y° and Y 1 are being independent on the right 
hand sides. Clearly, centered normally distributed 



Y° and Y 1 with identical variances solve the system 
( |6.22| )-( [6i23| . The task now is to show that Y° and 
Y^ converge in distribution towards these solutions Y° 
and Y 1 respectively. 

Step 3. The operator of distributions. Our ap- 



proach is based on the system ( 6.22 )-( 6.23) of limit 



equations together with an associated contracting oper- 
ator (map) on the space of probability distributions as 
follows: We denote by A^O, 1) the space of all prob- 
ability distributions on the real line with mean 0, vari- 
ance 1 and finite absolute moment of order s. Later 
2 < ,s < 3 will be an appropriate choice for us. With 
the abbreviation M 2 := A^ s (0, 1) x A4 S (0, 1) we define 
the map 



T : M 2 -> M 2 

fa, ti) ^ (c (ypooW + v/l-poo^ 1 ) , 



where W°, W 1 are independent with distributions 

C(w i ) = n for both i e s. 

This allows a measure theoretic reformulation of 



solutions of ( 6.22 1-( 6.23) that is convenient subse- 



quently: Random variables (Y^Y 1 ) solve the system 
( 6.22| H6.23) if and only if their pair of distributions 
(£(r°),£(F 1 )) is a fixed point of T. Hence the 
identification of fixed-points and domains of attraction 
of such fixed-points plays an important role in the 
asymptotic behavior of our sequences (Y®) n >o and 
(Yn) n >o an d is a core part of our proof. 

Step 4. The Zolotarev metric. In accordance 
with the general idea of the contraction method we will 
endow the space M. 2 with a complete metric such that T 
becomes a contraction with respect to this metric. The 
issue of fixed-points is then reduced to the application 
of Banach's fixed-point theorem. 

As building block we use the Zolotarev metric on 
■M s (0, 1). It has been studied in the context of the 
contraction method systematically in [29]. We only 
need the following properties, see Zolotarev [4Tl[4j]: For 
distributions £(X), £{Y) on K the Zolotarev distance 
Cs, s > 0, is defined by 



(6.24) 



Cs(X,Y) :=C(£(X),£(Y)) 

sup |E[/(X)-/(Y)]| 



where s = m + a with 0<a<l, meNo, and 
{f £ C m :\\fW(x)-fW(y)\\<\\x- 



the space of m times continuously differentiable func- 
tions from K to R such that the m-th derivative is Holder 
continuous of order a with Holder-constant 1. We have 
that £ S (X,Y) < oo, if all moments of orders 1, . . . , m 
of X and Y are equal and if the s-th absolute moments 
of X and Y are finite. Since later on only the case 
2 < s < 3 is used, for finitcness of ( S (X, Y) it is thus 
sufficient for these s that mean and variance of X and 
Y coincide and both have a finite absolute moment of 
order s. Convergence in implies weak convergence on 
M. Furthermore, ( s is (s,+) ideal, i.e., we have 

UX + Z,Y + Z) <(s(X,Y), 
UcX,cY) = c s UX,Y) 

for all Z being independent of (X, Y) and all c > 0. 

Now, to measure distances on the product space 
M 2 we define for (r , n), (g , Q\) £ M 2 the distance 

C(( T o, n), (00, qi)) ■= Cs(t , go) v Cs(n> ei)- 

Here and later on, we use the symbols V and A for max 
and min respectivly. 

Step 5. The contraction property. We directly 
obtain that T is a contraction in from the property 
that ( s is (s, +) ideal: Denoting the components of T 
by To and T\ we have 

C s (To(t ,ti),T (6 i o,£ ) i)) 

< Poo 2 (s(t , g ) + (1 - Poo) s/2 Cs(ti, Qi) 

< (4 2 + (l-Poo) s/2 )C((ro,r 1 ),(e ,ei)), 
and similary 

C s (Ti(to,ti),Ti(po,£>i)) 

< (1-Ph) s/2 Ut ,Qo)+P S 1 / 1 2 Ut 1 ,q 1 ) 

Hence together with ^ := max ie s(Pi/ 2 + (1 ~Pu) s ^ 2 ) we 
obtain that 

(6.25) C^.Taneo^iBfC^ri),^?!)). 



Since pa £ (0, 1) by assumption ( |2.2[ ) we have ^ < 1 for 
all s > 2. On the other hand, it is known that one only 
obtains finiteness of ( s on A^ s (0, 1) for s < 3, hence 



(6.25) is only meaningful for s < 3. Thus, altogether, 



y\\ a h 



our choice of s is 2 < s < 3. For these s we obtain that 
T is a contraction in 

Step 6. Convergence of the YJ,. An intuition 
why contraction properties of the map T lead to con- 
vergence of the Y 7 l towards the unique fixed-point 



(Af(0, 1),W(0,1)) of T in M 2 is as follows: The map 
T serves as a limit version of our recurrence system 
( 6.21[ ) . Since in this recurrence system we could replace 
the Yp and Y^~p on the right hand side by the re- 



currence (6.21 ) itself, iterating these replacements leads 
approximatively to an iteration of the map T. How- 
ever, by Banach's fixed-point theorem, the iteration of 
T applied to any starting point in M 2 converges to the 
unique fixed-point of T in the metric Q . 

Hence, the problem of proving the convergence of 
the Y^ to the standard normal distribution (the fixed- 
point) is reduced to the following technical task: Verify 
that not only the iterations of T itself convergence in the 
metric to the fixed-point, but also that the iterations 
of the approximations of T that make the recurrence of 
the Y* convergence within . 

Once this is settled, we use that convergence in 
£ s is strong enough to imply weak convergence and 
(V(0, 1),7V(0, 1)) is the unique fixed point of T. This 
finally yields Proposition |6.2| A detailed proof is given 
in the full paper version of this extended abstract. 



Transfer to arbitrary initial distribu- 

For this, we 



Step 7 

tions. Finally, we prove Theorem 6.1 



have to transfer the convergence of the Y£ from Propo- 



sition 6.2 to the convergence of the normalization of 
via (3.4). Recall that in (3.4 1, the K n is a binomial 



B(n,u ) distributed random variable. We write 



L£-E[L£] L£ ~ v (Kn) - Mn - K n ) 



yj n log n yj n log n 

v a (K n ) + Vl (n-K n )-E[L»] 
y/n log n 



+ 



By the Lemma of Slutsky, see, e.g. [TJ Theorem 3.1], it 
is sufficient to show, as n — > oo, 



(6.26) K-MK n )-^{n -K^ _^ ^ 
V n log n 



(6.27) 



Vo {K n ) + Vl {n-K n )-¥,[K] 
y/n logn 



0. 



For showing (6.26) note that by Proposition 6.2 (L\ 
E[LJJ)/-vAik>gra -» A/"(0, a 2 ) in distribution for both 



€ E. We set A n := [a n 



7 2 / 3 



,/z n + n 2/3 ] n N 



and A c n :— {0, . . . , n} \ A n . Then by Chernoff 's bound 
(or the central limit theorem) we have P(K n E A n ) — > 1. 



For all x £ R we have with K n j := P(K n = j) 



< x 



/ L^- Vo {K n )- Vl {n-K n ) 
\ y/n log n 

'L° Kn - u (K n ) -vi{n- K n ) 



+ 0(1). 



\J n log n yj n log n 



y/n logn 



y/n log n 



< x 



< x 



For j £ A n we have y/j log j/y/n log n — > v / /Uo and 
yj(n — j) log(n — j)/y/n logn — > y/l — fJ®. Hence, we 
have (L° - v {j ))/y/n log n -> N(0,u a 2 ) and (L l n _ j - 
v\(n — j))/\fn logn — > Af(Q, (1 — "o) " 2 ) in distribution 
and the two summands are independent. Together, 
denoting by iVo j<7 a an JV(0,a 2 ) distributed random 
variable we obtain 



L% ~ v {K n ) - Vl {n - K n ) 



< x 



y/n logn 

= o(l) + M P < x) + o(l)) 

-> P (JVo,o» < x) , 
where the latter convergence is justified by dominated 



convergence. This shows (6.26) 



To establish the convergence in probability in (6.27) 



note that (3.4) implies 



E[L£] = E[vo(K n )] + E[ Vl (n - K n )] 



Hence, with the notation (4.7) and g(x) :— a; log a: for 
x € [0, 1] and || • ||i denoting the Li-norm we have 



1 



yj n log n 
1 

y/n log n 



< 



1 



Hy/n log 



v [K n ) + v l {n-K n )-E[Lit]\\ 1 

v {K n ) -E[v Q {K n )} 
+ v 1 {n-K n )-E[v 1 {n-K n )]\\ 1 
4g(K n )-E[g(K n )] 



y/n log n 
1 

y/n log n 



+ g{n-K n )-E[g{n-K n )]\\i 
||/ (if n )-E[/o(Jfn)]||i 



||/i(n- jr^-Et/i^-JCJllli. 



With the concentration of the binomial distribution we 
obtain 

\\g(K n ) - E[g(K n )} + g(n - K n ) - E[g(n - K n )]\U 



A' 



E 



n 

n - K n 



n 



n 



-E 



n - K n 







(»■'•)■ 



The terms ||/ (A n ) - E[f (K n )]\\i and \\fi(n - K n ) - 
E[/i(n — X„)]||i are also of the order O^ 1 / 2 ) by a self- 
centering argument. Altogether we have 



|MA») + z/i(n- A»)-E[L£]|| 
V / nTogn 



1=0 



1 

y/\ognJ ' 



which, by Markov's inequality, implies (6.27) as follows: 
For any e > we have 



MK n ) + Mn-Kn)-E[L%] 



> e 



< -E 

£ 



\/ n log n 
v Q {K n )+v x {n- K n )-E[L£ 



1 



Ey/n log n 
• 0. 



^Jn log n 
|^o(A n ) + ^(n-A n )-E[^]||a 



7 Comparison with a multivariate approach 

We propose the use of systems of univariate recurrences 
in this extended abstract. Note however, that known 
limit theorems from the contraction method for mul- 
tivariate recurrences can as well be applied to the bi- 
variate random variable Y n :— (Y^,Y^). (Technically 
easiest is to keep the components Y® and Y^ indepen- 
dent by working with independent J° and /„.) Apply- 
ing such an approach as developed in [25], the system 



( 6.22 H 6.23) is now replaced by the bivariate recursive 



distributional equation 



(7.28) 



Y = A 1 Y + A 2 Y, 



where Y and Y are independent and identically dis- 
tributed bivariate random variables and the matrices 
Ai,A 2 are give by 



A 1 := 
A 2 := 





JpTi 

V 1 - Poo 
^/T^PiT 



Any centered bivariate normal distribution solves the 



latter fixed-point equation (7.28). In particular Theo- 
rem 4.1 in |29j covers the arising bivariate recurrence, 



cf. also condition (38) in [29], which is satisfied for A\ 



A 2 in (7.281 



However, for applying the contraction method in 
such a multivariate form, an underlying contraction is 
only implied for, see condition (25) in [25], 



\Ai 



op 



\M\i P < i, 



where || • || op , here, is identical to the spectral radius of 
the matrix. This imposes the additional condition 

(7.29) boo Vpn) 3 / 2 + (1 -poo A P n) 3/2 < 1 

to come up with a result similar to our Theorem |6.1| 

Our new approach based on systems of univariate 
recursive equations given above does not require any 



further condition such as (7.29). 
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